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Abstract. We study the joint limit distribution of the k largest eigenvalues of a p x p sample covariance 
matrix XX 1 based on a large px n matrix X. The rows of X are given by independent copies of a linear 
processs, X it = £j CjZ^-j, with regularly varying noise (Z„) with tail index a e (0, 2). It is shown that 
the point process based on the eigenvalues of XX 1 converges in distribution to a Poisson point process 
with intensity measure depending on a and 2 c 2 .. This result is extended to random coefficient models 
where the coefficients of the linear processes (X„) are given by Cj(8j), for some ergodic sequence (#,), 
and thus vary in each row of X. As a by-product, we obtain a proof of the corresponding result for 
matrices with iid entries in cases where p/n goes to zero or infinity. 



1. Introduction 

Recently there has been increasing interest in studying large dimensional data sets that arise in 
finance, wireless communications, genetics and other fields. Patterns in these data can often be sum- 
marized by the sample covariance matrix, as done in multivariate regression and dimension reduction 
via factor analysis. Therefore, our objective is to study the asymptotic behavior of the eigenvalues 
> . . . > A(p-) of a p X p sample covariance matrix XX J , where the data matrix X is obtained from 
n observations of a high-dimensional stochastic process with values in R p . Classical results in this 
direction often assume that the entries of X are independent and identically distributed (iid) or satisfy 
high moment conditions. Our goal is to weaken the moment conditions by allowing for heavy-tails, 
and the assumption of independent entries by allowing for dependence within the rows and columns. 
Potential applications arise in portfolio management in finance, where observations typically have 
heavy-tails and dependence. 

Assuming that the data comes from a multivariate normal distribution, one is able to compute the 
joint distribution of the eigenvalues (/l(i), . . . ,^( P y), see lfl6ll . Under the additional premise that the 
dimension p is fixed while the sample size n goes to infinity, Anderson B21] obtains a central limit 
like theorem for the largest eigenvalue. Clearly, it is not possible to derive the joint distribution in a 
general setting where the distribution of X is not invariant with respect to orthogonal transformations. 
Furthermore, since in modern applications with large dimensional data sets, p might be of similar 
or even larger order than n, it might be more suitable to assume that both p and n go to infinity, 
so Anderson's result may not be a good approximation in this setting. For example, considering a 
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financial index like the S&P 500, the number of stocks is p = 500, whereas, if daily returns of the 
past 5 years are given, n is only around 1300. In genetic studies, the number of investigated genes p 
might easily exceed the number of participating individuals n by several orders of magnitude. In this 
large n, large p framework results differ dramatically from the corresponding fixed p, large n results 
- with major consequences for the statistical analysis of large data sets fl7tl . 

Spectral properties of large dimensional random matrices is one of many topics that has become 
known under the banner Random Matrix Theory (RMT). The original motivation for RMT comes 
from mathematical physics 11211 . 13111 . where large random matrices serve as a finite-dimensional 
approximation of infinite-dimensional operators. Its importance for statistics comes from the fact that 
RMT may be used to correct traditional tests or estimators which fail in the 'large n, large /?' setting. 
For example, Bai et al. Q4£] gives corrections on some likelihood ratio tests that fail even for moderate p 
(around 20), and El Karoui IU3I1 consistently estimates the spectrum of a large dimensional covariance 
matrix using RMT. Thus statistical considerations will be our motivation for a random matrix model 
with heavy-tailed and dependent entries. 

Before describing our results, we will give a brief overview of some of the key results from RMT 
for real-valued sample covariance matrices XX J . A more detailed account on RMT can be found, for 
instance, in the textbooks [Q]], 0], or |21]. Here X is a real p x n random matrix, and p and n go 
to infinity simultaneously. Let us first assume that the entries of X are iid with variance 1. Results 
on the global behavior of the eigenvalues of XX J mostly concern the spectral distribution, that is the 
random probability measure of its eigenvalues p~ l 2Zf = \ e n -i^ , where e denotes the Dirac measure. 
The spectral distribution converges, as n,p — > oo with p/n — > y € (0, 1], to a deterministic measure 
with density function 



1 



2nxy 



V(x+ - x)(x - X-)l( x _ tX+ )(x), x ± := (1 ± Vr) 2 > 



where 1 denotes the indicator function. This is the so called Marcenko-Pastur-law 12011 . H30TI . One 
obtains a different result if XX 1 is perturbed via an affine transformation 12011 . 12311 . Based on these 
results, I25TI treats the case where the rows of X are given by independent copies of a linear process. 
Apart from a few special cases, the limiting spectral distribution is not known in a closed form if the 
entries of X are not independent. 

Although the eigenvalues of XX T offer various interesting local properties to be studied, we will 
only focus on the joint asymptotic behavior of the k largest eigenvalues (/l(i), - - - ,^(k)), k e N. This 
is motivated from a statistical point of view since the variances of the first k principal components 
are given by the k largest eigenvalues of the covariance matrix. Geman [15] shows, assuming that the 
entries of X are iid and have finite fourth moments, that converges to x+ = (1 + -\/y) 2 almost 

surely if p/n — > y e (0, oo). Moreover, if the entries of X are iid standard Gaussian, Johnstone fjT 
shows that 



T 



Hi) 



where £ follows the Tracy-Widom distribution with f3 - 1 . Soshnikov extends this to more general 
symmetric non-Gaussian distributions if the matrix X is nearly square, and obtains a similar result for 
the joint convergence of the k largest eigenvalues. The Tracy-Widom distribution first appeared as the 
limit of the largest eigenvalue of a Gaussian Wigner matrix 12911 . Peche 12411 shows that the assumption 
of Gaussianity in Johnstone's result can be replaced by the assumption that the entries of X have a 
symmetric distribution with sub-Gaussian tails, and she allows for y being zero or infinity. 
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There exist results on extreme eigenvalues of XX J which include dependence within the rows or 
columns of X, but most of them are only valid if X has complex-valued entries such that its real as 
well as its complex part have a non-zero variance. A notable exception, where the real-valued case is 
considered, is [9]. They assume that the rows of X are normally distributed with a covariance matrix 
which has exactly one eigenvalue not equal to one. 

In contrast to the light tailed case described above, there exist only a handful of articles dealing with 
sample covariance matrices obtained from heavy-tailed observations. Assume that the entries of X are 
iid and regularly varying with tail index a e (0,4). This implies that the entries of X have infinite 
fourth moments. For < a < 2 and p/n — > y e (0, oo), the limiting spectral distribution of XX J is 
computed by Belinschi et al. |@]. In the same framework, Soshnikov [28] shows that the point process 
of normalized eigenvalues converges to a Poisson process. This implies a Frechet limiting distribution 
for the normalized largest eigenvalue, which is in sharp contrast to the Tracy-Widom result in the 
light-tailed case. The last result also holds if 2 < a < 4, see Auffinger et al. [3|]. 

We extend Soshnikov's result by allowing for dependent entries. More specifically, the rows of 
X are given by indep ende nt copies of some linear process. Their respective coefficients can either 



all be equal (Section |2.1|) or, more generally, conditionally on a latent process, vary in each row 
(Section 



2.31) . In the latter case the rows of X are not necessarily independent. The limiting Poisson 
process of the eigenvalues of XX J depends on the tail index a as well as the coefficients of the observed 
linear processes. As a by-product, we obtain an independent proof of Soshnikov's result for iid entries 
which also holds in cases where y e {0, oo}. 

The paper is organized as follows. The main results will be presented in Section 2 while the proofs 
will be given in Section 3. Results from the theory of point processes and regular variation are required 
through most of this paper. A detailed account on both topics can be found in a number of texts. We 
mainly adopt the setting, including notation and terminology, of Resnick 02611 . 

2. Main results on heavy-tailed random matrices with dependent entries 

2.1. A first result on the largest eigenvalue. Let (Z,f) !jt be an array of iid random variables with 
marginal distribution that is regularly varying with tail index a > and normalizing sequence a n , i.e., 

(1) lim nP(\Zj t \ > a n x) = x~ a , for each x > 0. 

n— >oo 

Equivalently, this means that (|Z, t |) is in the maximum domain of attraction of a Frechet distribution 
with parameter a > 0. The sequence a n is then necessarily characterized by 

(2) a n = n l/a L(n), 

for some slowly varying function L : R + — » R+, i.e., a function with the property that, for each x > 0, 
lim^co L(tx)/L(t) = 1. In certain cases we also assume that Z\\ satisfies the tail balancing condition, 
i.e., the existence of the limits 

(3) lim q and lim = 1 — q 

Wnl > x) H P(\Z n \ >x) 

for some < q < 1. For each p, n e N, let X = (Xj t ) be the pxn data matrix, where, for each i, 

oo 

(4) X U = ^ cjZ UH 
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is a stationary linear times series. To guarantee that the series in © converges almost surely, we 
assume that 

oo 

(5) ^ \cjf < oo for some 6 < min{a, 1}. 

y=-oo 

Thus in our model the rows of X are given by iid copies of a linear process. We denote by A\, . . . , A p 
the unordered, and by A(\) > ... > A( P ) the ordered eigenvalues of the p x p sample covariance matrix 
XX 1 . They are studied via the induced point process 

( 6 ) ^=ixiv 

(=1 

We will always assume that p - p n is an integer-valued sequence in n that goes to infinity as n — » oo 
in order to obtain results in the 'large n, large p' setting. In the following we suppress the dependence 
of p on n so as to simplify the notation wherever this does not cause any ambiguity. In ^ |28|] the iid 
case is considered, i.e., X,- f = Z, f , assuming that the condition CD) holds for < a < 4. They show that, 
if p,n —* oo with 

(7) lim — = y g (0, oo), 

n~ >oo 72 

Af„ converges in distribution to a Poisson process AT with intensity measure v((x, oo]) = x~ a ^ 2 . Our next 
theorem extends this result by considering the case where X has dependent entries. More precisely, the 
rows of X are given by independent copies of a linear process. It will turn out that the intensity measure 
of the limiting Poisson process depends on the sum of the squared coefficients of the underlying linear 
process. 

Theorem 1. (i) Define the matrix X = (Xj t ) as in equations (Q]), (0]) and ([5]) with a g (0, 2). 
Suppose p n , n — > oo such that 

(8) lim sup -| < oo 

n^co n" 

for some > satisfying 

(a) p < oo a e (0, 1], and 

(b) ^<max{^f,i)^a€(l,2). 

Further assume, in case a € (5/3, 2), f/ia? Zn /las mean zero and satisfies the tail balancing 
condition (f3]). Then the point process N n of the eigenvalues of a~ 2 XX J converges in distribu- 
tion to a Poisson point process N with intensity measure v which is given by 

a/2 



v((x, oo]) = X 



_ v -a/2 



2<5 



x > 0. 



(if) Assume that X„ = Z, f and equation (Q]) is satisfied with a € (0, 2). Further, let either 

(a) p„ = n K l(n)for some k € [0, oo), where I is a slowly varying function which converges to 
infinity if k = 0, and is bounded away from zero if k — 1, or 

(b) p n ~ C exp(cn K )for some k,c,C > 0. 

77ie« N n converges in distribution to a Poisson point process with intensity measure given by 

y(( X , OO]) = X~ a/2 . 

Theorem [j] (i) weakens the assumption of independent entries made so far in the literature on 
heavy-tailed random matrices at the expense of assumption ®, which is more restrictive than the 
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usual assumption ([7]) if a € [1.5,2). However, if a € (0, 1.5), our assumption ([8]) is more general 
than f7]). This is important for statistical applications, because p and n are usually fixed and there 
is no functional relationship between the two of them. If we restrict ourselves to the iid case, then 
Theorem Q] (ii) shows that the point process convergence result also holds in many cases where the 
limit y from condition © is zero or infinity, for example, by assuming that p is regularly varying in n. 

It is well known 12611 that a Poisson process has an explicit representation as a transformation of a 
homogeneous Poisson process. In our case, the limiting Poisson process ./V with intensity measure v 
from Theorem Q] can be written as 



(9) 



1=1 



where T, = £i=i is the successive sum of iid exponential random variables Ek with mean one. The 
points of N are labeled in decreasing order so that, by the continuous mapping theorem, we can easily 
deduce the weak limit of the k largest eigenvalues of XX J . 

Corollary 2.1. Under the assumptions of Theorem\^we have, for each fixed integer k > 1, that 

a - 2 p (%), . . . , A (k) ) A (rf a , . . . , r~ k 2/a ) J] c) 

In particular, for each x > 0, 



V,/= 



,(M < 

[a 2 ~ 



x | — > P(N(x, oo)<k-l) = e 

n—>co 



„ Y ~va/2 

r -a/2 \ 1 X 



v=0 



\va/2 



v ; ) 



Equivalently, in terms of the singular values s^ = ^ ■ ■ ■ ^ s (p) - y^-(p) of tne matrix X, we 

obtain, for any fixed positive integer k, that 



a (j(i), . . . , .?(*) 



* y=-o 



In a nutshell, the results in this section give the asymptotic behavior of the k largest eigenvalues of 
a sample covariance matrix XX J when the rows of X are given by iid c opie s of some linear process 



with infinite variance. Our results will be generalized further in Section 12.31 . where, conditionally on 



a latent process, the rows of X will be independent but not identically distributed. 

2.2. Examples and discussion. Theorem Q] holds for any linear process which has regularly varying 
noise with infinite variance as long as condition (O is satisfied. Since the coefficients of a causal 
ARMA process decay exponentially, © is trivially satisfied in this case. As two simple examples, 
consider an MA(1) process X, f = Z,-, + 0Z, f _i, which satisfies ^icr- = 1 + 9 2 ; and a causal AR(1) 

"* J 

process Xu - (pXi t -\ = Z, t , \0\ < 1, where £ ; c 2 = (1 - <p )~ . Yet another example of a linear process 
fitting in our framework is a fractionally integrated ARMA(p, d, q) processes with d < and regularly 
varying noise with index a € [1,2), see, e.g., 1101 for further details. In this case \cj\ < Cj d ~ l is 
summable and therefore condition © is satisfied for a > 1 . 

Regarding the normalization in ©, the sequence a n is chosen such that the individual entries of 
the matrix Z := (Z ir ) iit satisfy (Q]). Replacing the iid sequence in the rows of Z with a linear process 
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to obtain the matrix X changes the tail behavior of its entries. Indeed, the result stated in Davis and 
Resnick II lL eq. (2.7)] shows, under the assumption (O, that 



nP 



> a n pX 



-l/o 



In view of £[]) this suggests the normalization % t = C^j \cj\ a ) Denote by X\, . . . , X p the eigen- 

values of XX T , where X = (X^),-^. Since this is just a multiplication by a constant, we immediately 
obtain, by Theorem Q], that 

p 



N, 



= 1 



where N is a Poisson process with intensity measure v given by 

a/2 



(10) 



Thus 



v((x, oo]) = JC' 



■or/2 



a/2 



Tjj c j (Tjj \ c j\ a ) quantifies the effect of the dependence on the point process of the eigen- 



values when the tail behavior of each marginal Xj t is equivalent to the iid case. 



Assume for a moment that the dimension p is fixed for any n. Then it follows easily from 111 



Theorem 4. 1] and arguments of our paper that a~ /l(i) — > £y c 2 - maxi<,< p Si in distribution as n — > oo, 
where (S,) are independent positive stable with index a/2, < a < 2. If p is large, one would 
intuitively expect that maxi<,< p 5, « p 2 ^ a T~ 2 ^ a , where Y\ is exponentially distributed with mean 1. 
Corollary l2.ll not only makes this intution precise but also gives the correct normalization a~ 2 . The 
distribution of the maximum of p independent stables is not known analytically, hence 'large n, large 
p' in fact gives a simpler solution than the traditional 'fixed p, large n' setting. 



2.3. Extension to random coefficient models. So far we have assumed that our observed process 
has independent components, each of which are modelled by the same linear process. From now on 
we will allow for a different set of coefficients in each row. To this end, let (#;)ieN be a sequence of 
random variables independent of (Z^) with values in some space 0. Assume that there is a family of 
measurable functions (cj : — > R)j £ N such that 

(1 1) sup \cj(9)\ < ~Cj, for some deterministic Hj satisfying condition (f5]). 
Our observed processes have the form 

oo 

(12) X it = J] cj(0i)Zi,t-j 

j=-°° 

where (Z, f ) is given as in (Q]) with a e (0, 2). Thus, conditionally on the latent process (#,), the rows 
of X are independent linear processes with different coefficients. Unconditionally, the rows of X are 
dependent if the sequence (#,) is dependent. Theorem below covers three classes among which (#,) 
may be chosen: stationary ergodic; stationary but not necessarily ergodic; and ergodic in the Markov 
sense but not necessarily stationary. In the following we say that a sequence of point processes ^ n 
converges, conditionally on a sigma-algebra *K, in distribution to a point process if the conditional 
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Laplace functionals converge almost surely, i.e., if there exists a measurable set B such that for all 
to e B and all nonnegative continuous functions / with compact support, 



(13) 



E (e' M " (f) YH) {to) — > E (e- mf) \<H) ( w ) as n 



Theorem 2. Define X - (X^) with X- n as given in (II 21) . Suppose that (1111) is satisfied, and p,n — > oo 
such that © /joWi under the same conditions as in Theorem Q ( i). Further assume, in case a e 
(5/3, 2), that Zn has mean zero and satisfies the tail balancing condition ©. 

(/) If (9j) is a stationary ergodic sequence, then, both conditionally on (0{) as well as uncondi- 
tionally, we have that 



(14) 



p oo 

i=i ;=i 11 ' 



w/f/i the constant 



/2\ 2 / a 



, and (r,) as in ©. 



(ii) If(9i) is stationary but not necessarily ergodic, then we have, conditionally on (9j), that 

p 



i'=l i=l 



with Y — E\\ c 2 j(6\)\ a l 2 \Q}, where Q is the invariant cr-field generated by (6i). In particu- 
lar, Y is independent of (Ft). 
(Hi) Suppose (9j) is either an irreducible Markov chain on a countable state space & or a positive 
Harris chain in the sense of Meyn and Tweedie $2^1 . If (9j) has a stationary probability 
distribution n then, conditionally on (9j) as well as unconditionally, (1141) holds with 



2»o 



a/2 



\2/a 



jr(d9) 



One can view the assumptions (i) and (ii) of Theorem in a Bayesian framework in which the 
parameters of the observed process are drawn from an unknown prior distribution. As an example, let 
(9i) be a stationary ergodic AR(1) process 9i = <p9\-\ + f,-, where \<p\ + 1 and is an iid sequence, 
and set Xj t - Z if + 0;Z !>f _i. Then, by Theorem^ (i), we would expect, for n and p large enough, that 



-ajl 



2/a 



Models of this kind are refered to as random coefficient models and often used in times series analysis, 
see, e.g., |19tl for an overview. In the setting of Theorem (iii) one might think of a Hidden Markov 
Model where the latent Markov process (#,) evolves along the rows of X, each state 9j defining another 
univariate linear model. 



3. Proofs and auxiliary results 



The first step is to show that the matrix XX J is well approximated by its diag onal , see Section |3.2. 
In the second step we then derive the extremes of the diagonal of XX J in Section l3.3l Both steps make 
use of a large deviation result which is presented in the upcoming section. 
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3.1. A large deviation result and its consequences. The next theorem gives the joint large devia- 
tions of the sum and the maximum of iid nonnegative random variables with infinite variance. 

Proposition 3.1. Let (x n ) ne -® and (y n )neN be sequences of nonnegative numbers with x n — > oo such 
that x n /y n -4ye (0, oo]. Suppose (Y t ) te ^ is an iid sequence of nonnegative random variables with 
tail index a € (0,2) and normalizing sequence b n . If I < a < 2, we assume that b n x n ln l+y — > oo for 
some y > 0. Then 

p {Vt=i Y t > b n x n ,m&xi< t < n Y, > b n y n ) 

(15) lim — i - = 1. 

n^co n p(Y\ > b n max{x„,y„}) 

Proof. Let us first assume that < a < 1. It can be easily seen that for any positive sequence z n — > 00 
we have 

,. P(max!<,<„ Y t > Kin) , 

(16) lim = 1. 

n^co nP(Yy > b n Z n ) 

Obviously the limit in (PT51) is greater or equal than one because zZ"=i Yt > maxi< f <„ Y t . Thus it is only 
left to prove that it is also smaller. Denote by 7(i) > . . . > 7(„) the order statistics of Y\,. . . , Y n . By 
decomposing £ f Y t into the sum of max, Y t and lower order terms we see that, for any 9 e (0, 1), 

p (XJU Y t > b nXn, maxi<,<„ 7 > b n y n ) ^ j p( maXl < f <„ Y, > b„ max{fo„, y n }) p (Z" =2 Y (t) > b n x n {\ - 8)) 

nP(Yi > b n max{x„,y n }) ~ nP(J\ > b n max{x„, y n }) nP(Y\ > b n max{x„, y n }) 

By excactly the same arguments as in the proof of (TT61 ) one shows that 



P(maxi<,<„7, > b n vnax{6x n ,y n }) 

lim lim = 1. 

0->i»-><x> nP(Y\ > b n max{x„, y n \) 

Hence, it is only left to show that the second summand vanishes as n — > oo. To this end we partition 
the underlying probability space into {7(2) < eb n x n ) U {7(2) > eb n x n ], e > 0, to obtain 

P (Z?= 2 *C) > b n*n(\ ~ 0)) P (Y? t=2 ^)V ( ,)<^ n ) > W " &)) 

nP(Yi > b n max[x„,y„}) ~ nP(Yy > b n max{x n ,y n }) 

P(7 (2) > eb n x n ) _ + ^ 
nP(Yi > b„max{x n ,y n }) 1 
Denote by M n = maxi^,, Y t and z n = max{x„, y n ). Then easy combinatorics and (fT6l) yield 
1-P(7 (2) < eb n x n ) 



X7 =- 



nP{Yy > b nZn ) 

l-P(M n < eb n x n ) _ nPjMn-i < eb n x n )P(Yy > eb n x n ) 

nP{Y x > b nZn ) nP{Y x > b nZn ) 

P(M n > eb n x n ) P(Yi > eb n x n ) P{Y X > eb n x n ) 



nP(Y Y > eb n x n ) P(Yi > b n z n ) P{Y X > b nZn ) 
P{Y X > eb n x n ) 

- — — -(1 -P(M n _i < eb n x n )) — > 0. 

P(7l > b n Zn) n ^°° 



P(M„_! < eb n x n ) 



The convergence to zero follows from P(M n -\ < eb n x n ) — > 1 and, by 11261. Proposition 0.8 (hi)], 
P p Y (Y\>b 'z ) ~ * e ~ a max l 1> j "}- Thus it is only left to show that T,\ goes to zero. By Markov's inequal- 
ity and Karamata's Theorem l26l Theorem 0.6] we have that 

X < 1 EjYq^b^) 1 a eP(Yi > eb n x n ) _^ _J__«_ \- a {J - a] 
1 - b n x n (l - 0) P(7j > b nZn ) (1-0)1 -a PiY^bnZn) (1 - 0) 1 - a 
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which converges to zero as e goes to zero, since a < 1. Thus for < a < 1 the proof is complete. 
If 1 < a < 2, only Ei has to be treated differently. The truncated mean p n = E(Y]\{Y\<eb n x n }) either 
converges to a constant or is a slowly varying function. In either case, we have that b n x n /(nfj. n ) = 
b n x n n~ l ~ y n y I '/i n — » oo by assumption. Thus, a mean-correction argument and Karamata's Theorem 
imply 

p {Tit=i Y tl{Y,<eb„x n } ~ nfi„ > b n x„(l -9)- nfiJ 
limsupZi <limsup 



nP(Y 1 > b nZn ) 



< -hmsup -—- — < =■ e max{l,y j — > 0, 

(1-0)2 n ^Jb 2 n x 2 n P(Y l >b nZn ) (l-9) 2 \-a/2 X,y '^0 



since a < 2. This completes the proof. 



We finish this section with a few consequences of Proposition l3.ll Note that £[]) implies 
(17) pnP(Z 2 > a 2 nD x) -— » x~ a/2 for each x > 0. 



Choosing Y t = Z 2 t , b n - a 2 ,, x n - xa 2 ip la 2 and y n = ya 2 p /a 2 , we have from Proposition 13. ll and (fPTT ). 
for a € (0, 2), that 



P P 



V Z? > a 2 x, max Z? > a 2 y — > max{x, y} for each x,y > 0. 

' ' l<t<n I «— >°o 

V(=l 



Therefore, by [26, Proposition 3.21], we obtain the point process convergence 

p CO 

(18) Z ^EJU z^max.^z?) ^ Z e r: 2 /" ( i,i)' 

1=1 i=l 

with (T,-) as in (©. For another application of Proposition l3.ll . set Y, = \Z\ t \, b n = a n , x n = xa np /a n 
and y„ - ya np /a n . Under the additional assumption 

liminf — € (0, oo] 

n— >oo yi 

we have b n x n ln lJr7 — > oo for some y < (2 - a)/a, thus, for a € (0, 2), 

/ n \ 

pP > |Zi f | > a„„x, max \Z\ t \ > a n „y — > max^y} -0 " for each x,y > 0. 

' 1<?<« ^ I n->oo 

Vr=i / 



Therefore we obtain as before 

p 



/J CO 

( 19 ) Z e «»;i(2" =1 Krl.max^ |Z,,|) ^ ^ 

;=1 (=1 

The result of the following proposition is also a consequence of Proposition |3. ll . 

Proposition 3.2. Let (Z,) fte as in (Q} wzY/z < a < 2. Suppose that (O is satisfied for some < yS < oo. 
Then 

n 

a~l max V \Z it Z jt \ — > 0. 

' \<i<i<p*— i n-^oo 
1 F t=\ 
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Proof. By 111411 . the iid random variables Y t = \Z\ t Zoi\ are regularly varying with tail index a with some 
normalizing sequence b n . Thus, there exists a slowly varying L\ such that P(Y\ > x) - x~ a L\(x). 
Using (© this implies 

p 2 nP(Y l > a 2 np e) = n~ l e~ a L(n P y 2a L x ((npf la L{npfe) . 



By Potter's bound, see, e.g., M26L Proposition 0.8 (ii)], for any slowly varying function L and any 8 > 
there exist c\, C2 > such that c\n~ & < L(n) < c%n 6 for n large enough. An application of this bound 
together with assumption ([8]> shows that 



(20) 



p 2 nP{Y x > a 2 np e) 



0. 



Hence, using Proposition |3. ll with x n = a 2 /b n e and y n = yields 



since b n x n /n i+7 - af tp /n l+y — > oo for a < 2 and some y < (2 - a)/a. 



3.2. Convergence in Operator Norm. Denote by D = diag(XX T ) the diagonal of the matrix XX T , 
i.e., Da = (XX T )a and D,y - for i + j. In this section we show that a~ p XX J converges in probability 
to a~ p D in operator norm. This implies that the off-diagonal elements of a~ p XX T do not contribute 
to the limiting eigenvalue spectrum. Recall that, for a real p x n matrix A, the operator 2-norm 
||A|| 2 is the square root of the largest eigenvalue of AA T , and the infinity-norm is given by HAH^ = 
maxi<,< p X" =1 |AyI- The following result holds under a much more general setting than assumed in 
Theorem qJ by allowing for an arbitrary dependence structure within the rows of X. 

Proposition 3.3. Let X — QCu)i,t be a p x n random matrix whose entries are identically distributed 
with tail index a € (0, 2) and normalizing sequence (a n ). Assume that the rows ofX are independent. 
Suppose that (O holds for some yS>0. If I < a < 2, assume additionally that /? < Then we have 



(21) 



D\\ 



Proof. Since \\XX J - D < \\XX T - D , it is enough to show that for every e e (0, 1), 



max 

i=l,...,p 



1=1 t=\ 



X, 



> a lp e 



< pP 



Y J Y J \X lt X jt \>a 2 np e 



\j=2 t=\ 



By partitioning the underlying probability space into {max^ \Xi t Xj t \ < a 2 np \ and its complement, we 
obtain that 



pP 



Z2> 

U=2 f=l 



Xj t \ > a np e 



<pP 



■I, 



+ pP\ max max \X\ t Xj t \ > a„ n e \ = 1 + 11. 

\2<j<pl<f<n ' 



The same argument used for (1201 shows that II < p 2 nP (\Xi\X2i\ > a 2 p ) — > by independence of 



the rows of X. To deal with term I we first assume that a > 1 and choose some y e (a, 2). Holder's 
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inequality shows that 



and therefore 



p n 



V 



U=2 t=\ 



p n 



U=2 t=\ 



{np) 



7-1 



I < pP 



p n 



2y ^ 

np 



Note that \X\ t Xj t \ y has regularly varying tails with index a/y < 1. Hence we can apply Markov's 
Inequality and Karamata's Theorem to infer that 

7 



(22) 



I<cr 



Therefore, the proof of Proposition 13.21 shows that the term in (1221 goes to zero if (np) 7 ^ 1 fn does. In 
view of assumption I© this is true for /3 < (2 - y)/(y - 1). Since we can choose y arbitrary close to 
a it suffices that /? < (2 — a) /(a - 1). If a < 1 we do not need Holder's inequality since the above 
argument can be applied with y - 1, thus it suffices that © holds for some f5 < oo. For the remaining 
case «=lwe choose y arbitrary close to 1 so that (np) 7 ' 1 /n — > for any given fi > 0. □ 

The above result can be improved for 5/3 < a < 2 if we assume the rows of X to be realizations of 
a linear process. 

Proposition 3.4. The assumptions of Theorem\^(i) imply (12 II ). 



Proof. By Proposition |33| it suffices to show that, for a € (5/3,2), the assumption 
(23) |: P 



lim = 



implies convergence in operator norm in the sense of (121b . In this proof, c denotes a positive constant 
that may vary from expression to expression. Define 



Z it ~ Z ith\Z it \<a np 



Zf t - Z it l {l z„\>a nl ,}, Xf f - ^ c k Z^ t _ 



k- 



Using \\XX J - D|L < \\XX J - D\\ as before we have 

° II 112 II Moo 

P 



P(\\XX J - D\\ 2 > a 2 np e) < P P £ £ X u Xj 

J=2 t=l 



<pP 



Z Z^* 1 



f Jt 



j=2 t=\ 



> a np e 



®np 

> —r-e 



+ pP 



Z Z*^? 



7=2 t=\ 



2 ^ 



+ pP 


( p 

z 

U=2 


2X4 

t=i 


2 } 

a. 

> 


+ pP 


( P 

z 

U=2 


2X4 

t=\ 


2 ^ 

a 

u np 



=1 + II + III + IV. 
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We will show that each of theses terms converges to zero. To this end, note that E\Z xl | converges to a 
constant, and, by Karamata's Theorem, 

E\Z^ \ ~ ca np P(\Z n \ > a np ) ~ ca np {np)~ l , n -> oo. 

Therefore, by Markov's inequality, we have 



11 ^jh Z Z Z M Elz t- k \ E \ Z U 

"»P e j=2 t=l k,l 



Z> c * 

\ k ) 



P 2n ( v-1 P 

—a np {np) = c — , 
a np a np 



and, by (©, we obtain that this is equal to c L(np) l p l l ^ a n — » as n — > oo. By symmetry, III can 
be handled the same way. It is easy to see that term IV is of even lower order, namely 



c^-{a np {np) l f = cn 1 



0. 



a 



np 



Thus it is only left to show that I converges to zero. To this end, we use Karamata's Theorem to obtain 

E^f] = E[z 2 u \ { \ Zn \< anp] ] ~ ca 2 np PQZ n \ > a np ) ~ ca 2 np {npT x . 

Since Z\\ satisfies the tail balancing condition (O, we can apply Karamata's Theorem to the positive 
and the negative tail of Z^ , thus 



£;„ :-E[Z n ] - E[Z u \ { \z n \<a np ]\ - -E[Z n l{\Z n \>a np }\ - -E[Z u l {Zn> a np ]\ + E[-Z n l{-Zu>a np }] 

~ - q-^—ra np P(\Z n \ > a np ) + (1 - q)-^—a np P{\Z n \ > a np ) ~ (1 - 2q)-^—a np (npY l 
a —I a —I a — I 



As a consequence we obtain, with d := (1 - 2q)-^, that 



/u n = E{X L n X^) = {EX^f 



Ck. 



£n ~ d 2 al p (np) 2 , 



and (ji n pn)/a 2 p ~ c(np) 1 — > 0. Therefore we obtain for summand I that 



I - pP 



E 

U=2 



®np 

> —r-e 



<p 2 P 



t=\ 



a 2 ^ 
> — e 
4p ) 



p 2 P 



2t Mftn 



®np I 

> — e\ 



4p 



Since we correct by the mean, Markov's inequality yields 



P 2 P 



t=i 



2t ftfln 



4 7 



•t=l 



(24) 



16/ 

4 2 
a «p e t,t'=lk,k',l,l' 



2 J] Q^wCovfzf^Z^.^Zf^^Z^,./,). 



Due to the independence of the Z's, the covariance in the last expression is non-zero iff t - k - t' - k' 
or t - I = f - I'. This gives us three distinct cases we deal with separately. First, assume that both 
t — k — f — k' and t — l = f — V . Then the covariance in (l24l is equal to WiafZ^Z^ j) and so bounded 
by 

E\_{Z L n )\z\,) 2 \ = {E{Z L u ) 2 f ~ (ca 2 np (np)- 1 ) 2 ~ caUnpY 2 . 
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Second, let t — k - f — kf but t — l + t' — V. Then the covariance becomes 

=E((Z^ k ) 2 )f n -£ 



~ca\„(np) l (±ca np (np) l f - (±ca np (np) l ) 4 



-co* (np) 3 - ca A (np) 4 ~ ca* (np) 3 , 



which is of lower order than in the case considered before. By symmetry, the third case, where 
t—l-f — l' but t—k^t' — k', can be dealt with in exactly the same way. In all cases f can be assumed 
to be fixed, thus we can bound (1241) by 



V k ) t=l 



n — » oo. 



□ 

3.3. Extremes of diag(XX r ). In this section we analyze the extremes of the diagonal entries of XX T , 
which are partial sums of squares of linear processes. To this end, we start with an auxiliary result. 

Proposition 3.5. Let (Z t ) be an iid sequence such that nP(\Z\\ > a n x) — > x~ a with a e (0, 2). For any 

sequence (cf) satisfying ((5]) we have, if p and n go to infinity, that 



pP 



Z Z C ) Z h > <' X "> Z<7 

t=l j=-co 



-a/2 



) \J=-°° ) 



Proof. Fix some x > 0. Observe that Proposition l3.ll and (fT7T ) imply for n — » oo that pP(2|Lj Z? > 
a 2 np x) — > x - '^ 2 . We begin by showing the claim for a linear process of finite order. For any 77 > we 
have 



m n 



j=-m f=l f=l j=-m 











> 


V 



Z c ? Z z?2 > ^ 



0. 



Consequently, 

(25) lim pP 

n— too 

This and the positivity of the summands implies 



Z Z C 3 Z '-7 > °'V X 
t=\ j=—m 



-a/2 



v/=-m y 



(26) 



lim inf /?.P 

n— »oo 



2 Z c 5 z '-i > 

f=l j=-oo 



> X 



-a/2 



Z<1 



\J= 



Thus it is only left to show that the limsup is bounded by the right hand side of (I26l >. Using Markov's 
inequality yields 



pP 



n co 00 00 

Z Z 'Kjxfa * Z /^(^ + 2 c 2 -^£(z 2 l |c?zf ^ pX| ) 

/= 1 j=—oo J j=—oo j— — CO tt P 
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Since E\Z 2 X^ Z 2^J is a regularly varying function with index a/2 - 1 we obtain, by Potter's bound, 
Karamata's Theorem and ([5]>, that, for some constant C\ > 0, 



2 P n 



.- /;|z 2 i. . v „; 



F ( 7 2, \_ C J I 1 pn 
E I Z l 1 {^Z?<< J .v) I " — —T^j r "T £ 1 Z 1 \z\<al p x}) 

K 1 ' x E{Z\1 {Z 2 al } ) a np 



J J 2 

a n pX 



< Cl !i ( cJ 2) 1 - a/2+W2 - 5/2) x l-"/2 = ClX -^\cjf. 



Likewise, pnP(a n 2 Z 2 > •) is a regularly varying function with index a/2, thus we obtain, by the same 
arguments as before, that 

pnP(cjZ 2 1 >a 2 np x)<C 2 x- a / 2 \cjf. 
With C - C\ + C2 this therefore implies 



(27) lim sup />P V V <?$_, > a 2 np x <cV |c;|V ff 

V f =l 7=-°° J J=-°° 

Hence, by (T25T ) and (1271) . we finally have, for some e e (0, 1), that 

*j do A m 



/2 



lim sup pP 

n^oo 

+ lim sup pP 



Z Z c ? z '-7 > a »p x 



t=l j=-co 
n 00 



< lim sup pP 



Z Z c ? z w > e£ 4 x 

«=1 j=m+l 



+ lim sup /?.P 



(28) 



< x 



-a/2 





( "i \ 


(1 - 2e)- a/2 






\J=-m ) 



Z Z c ? z '-i > (1 - 2e)a "P x 

t=l j=-m 

n —m—1 ' 

Z Z °J Z H > m "P x 

J=\ j=-oo 



-m-l 



+ Ce- a/2 J] \cjf + Ce- al2 £ \ C] \ 

j=m+\ j=~°° 

Assumption (f5]) shows that the last two terms in (1281) vanish for m — > 00. Letting e — > thereafter 
completes the proof. □ 

By virtue of the previous proposition we obtain the point process convergence of the diagonal ele- 
ments of the sample covariance matrix XX J . This immediately characterizes their extremal behavior. 
Note that this result holds without any restriction on /? even if a > 1. 

Proposition 3.6. Let X = (Xj t ) be as in equations (Q]), (J4]) and © with a e (0, 2), and suppose that 
© holds for some > 0. Then 

P DO 

(29) ) e.-2 yB X 2 — > N=) e r -2/,v V c„ 



with (T,) as in ©. 



Proof. For notational simplicity we assume without loss of generality that Xj t = Z }lo c jZi,t-j- The 
extension to the non-causal cas e is o bvious. We first prove the claim for finite linear processes X ;f>m = 
2Z'J = o cjZij-j. From Proposition Q^J we already have that 
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Thus it is only left to show that all terms involving cross products are negligible. By flol Theorem 
4.2] it suffices to show, for any rj > 0, that 



(31) 



lim P 

n—>oo 



( p 




( n \ 




z 


/ 


n~ 2 V X 2 
n P /_i it,m 








\ t=\ > 


\ 



a 4 C ) Z lt~j 



for any continuous function / : R + — > R.+ with compact support supp(/) c [c, oo] and c > 0. Choose 
some < y < c and let K - [c - y, oo]. On the set 



\y - 



max 

l<i<p 



f=l 



the following is true: if a n 2 YIt=\ ^J=o c2 jZf t _j $ K, then the absolute difference in (f3TT > is zero, else it 
is bounded by the modulus of continuity co{y) = sup{|/(x) -f(y)\ :\x-y\< y}. Hence, the probability 
in d3Tb is bounded by 



V i=i J 
By (l30l >. the first summand converges to 

foo > 
0)(j) ^ Eym o c 2 r7 2/ ff (£) > 7/ . 



(=1 



Since ££\ e v ,„ 2r -2/ a (^r) < oo and o»(y) — > as y — > 0, this probability approaches zero as y tends to 

L j=0 C j l i 

zero. To show that 



(32) 



m—l m n 

P((Al) C )<P 2 J] J] Ic^lrnax^lZ^-Z^^a^y 

, ;=0 fc= 7 +l ' :p t=l 



we use the following observation for fixed j € {0, . , . ,m — 1} and k e { j + 1, . . . ,m}: the product 
Zy-jZjj-k has, because of independence, tail index a, and Z; (? _jZ; (? _£ and Zj (iS _jZ,- (iS _t are independent 
if and only if |s - f| + k - j. Thus, we partition the natural numbers N in to k - j + 1 pairwise disjoint 
sets s + (k - j + l)No, ie [0, . . . ,k — j}. Then we have, by Proposition 13.21 and the independence of 
the summands, that 

V \ii, t -jZi, t - k \ -A o, 

t—l il — >oa 

tes+(k-j+l)N 

for each s £ {0 k — j}. Since j, k only vary over finite sets this implies 

shown (1291 ) for a finite order moving average X\ Un . 
Now we let m go to infinity. Clearly, we have that 



a„ n max 

' \<i<p 



Therefore we have 



(33) 



oo oo 

> e r -2/ay m 2 > ) Ep-2/o y„ 2- 

Z_i ^j=Q j m— >oo Z— ( ^j=0 L j 



1=1 ' " 1=1 

Thus, by |Hl Theorem 3.2], it is only left to show that 

f p 



(34) 



lim lim sup P 



2> -/ v 2 IX> 



Vj=l 



V f=l 



>j/ =0. 
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By repeating the previous arguments, it suffices to show 
lim sup P 



a "P £g* Z ^ ~ X lm\ > T I < lim sup pP | a 2 £ \X 2 lt - X 2 Um \ > y 



0, 



as m — > oo. Clearly, we have that 



oo oo 



(35) Xl t -X\ um = c ) Z \t-j + 1 Yj Yj c j CkZl ' t -j Zl ' t ~ k+ Tj Tj c J c kZu~jZht~k- 



j=m+l 



j=m+l £=() 



j=m+l k=m+\ 



For the first summand on the right hand side of equation (1331 ) we have, by Proposition 13.51 . that 



P p 



f=l j=m+l 



sa/2 



y=m+i ; 



-ar/2 _ 

m— >oo 



Using Proposition lLJ and the elementary inequality 2\ab\ < a 2 + b 2 , we obtain for the second term in 
equation (l35l > that 



n oo m 



2 Tj Tj Tj \ c J c kZu-jZ U -k\ > 

{ t=l j=m+l k=0 



< pP 



n oo m 



V 2 
— CL 

2 np 



+ pP 



n oo m 

X Yj X |c / c * |z iV* > | a 

^f=l j=m+\ k=0 



1 2 



77-072 
2 I 



E E Z ^ z Ih > ^ 

^f=l j=m+l k=0 

I m W 2 /- 00 W 2 

£=0 / V7= m +1 / 



and since Zylo |c/| < 00, this term converges to zero as m — > 00. The third term in equation (1331) can 
be handled similarly. Thus the proof is complete. □ 

3.4. Proof of Theorem [2- In this section we use the foregoing results to complete the proof of The- 
orem Q]. 

Proof of Theorem^ (i) Denote by S t = (XX 7 )^ = Xf the diagonal entries of XX T and by 5 m > 
. . . > S(p) their order statistics. Then Weyl's Inequality, cf. Q, Theorem III.2.1], and Proposition \3.4 
imply that 



(36) 



a 2 max \A {k) -S {k) \< a 2 \\XX T - D 



i<k<p 



o, 



where D = diag(XX T ). From Proposition |3 A we have 



(37) 



- A D 

1=1 



Thus, by [18, Theorem 4.2], it suffices to show that 

( p 



PWn(f)-N n (f)\>V)<P 



*(0 



> V 







for a nonnegative continuous function / with compact support supp(/) c [c, 00], for some c > 0. Since 
N((c/2, 00]) < 00 almost surely, we can choose some i e N large enough such that the probability 
P(N((c/2, 00]) > i) < <5/2. By (22), it follows that P{a- 2 p S (i) > c/2) = P(j?„((c/2,oo]) > 7) -> 
P(N((c/2, cx)]) > 7) and thus, for 7? large enough, P(a~ 2 S({) > c/2) < 5. Consequently, by (1361) . it 
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follows that P(a n pA(i) > c) < 28. Since a n pS^ < c/2 and a 2 A{^ < c imply that both f(a n 2 M(k)) = 



and f(a n pA(k)) = for all k > i, we obtain 



Z 

+ P 



+ P 



<36 + P 



"np 



f 



p 

z 

7=1 

z 

7=1 



1 0) 



1 (./) 



/ 



< P 



z 

U=i 



/l^l-zi-r 1 



H/) 



> »7. V 5 » > o 



/ 



f 



np ) 

I 
a. 



c 

> Tj,a^S (i) < -,a~ 2 A (i) < c 



> V 



which becomes arbitrarily small due to equation (1361 ) and the fact that / is uniformly continuous. 

(ii) By assumption X = (Z if ). First we consider the case (a) and assume that k > 1. We will show 
that, for any fixed positive integer k, 



(38) 



Equations (1181) and (138D then imply 



1. 



'(*) 



s (k) 


A (k) 




1 _ i®. 

S(k) 


S (k) 


a 2 

u np 


a 2 

u np 




a 2 

u np 



p 

> I 

n—tco 



and hence N n — > /V as in the proof of Theorem Q] (i). Define M, = max^,^ X 2 and denote by 



(i) * 



> M( P ) the order statistics of Mi, ... , M p . Observe that the continuous mapping theorem 



applied to (PT8T ) and ( fl9l ) yields, for any fixed k, 

S (k) P 



because k > 1 . Now we start showing 

^(i) 
5(i) 



1, and 



1, 



M(l) n^co 

by induction. For & = 1 we have, on the one hand, that 



XnXj\\ 2 ^ \\X n \\ 2 2 ^ \\X n \\i \\X n \\i M ( 



1. 



5(i) 5(i) 5(i) M(i) 5(i) 

Let us denote by e\,...,e p the standard Euclidean orthonormal basis in R p and by i\ the (random) 
index that satisfies 5f, = 5(i). Then we have, on the other hand, by the Minimax Principle tA 
Corollary III. 1.2], that 



5(i) 



5(i) 



1. 



5(i) 5(i) 

This shows (1381 ) for & = 1. To keep the notation simple, we describe the induction step only for k -2. 
The arguments for the general case are exactly the same. Denote by 12 the random index such that 
5; 2 = 5(2). Let X^ be the (p - 1) x n matrix which is obtained from removing row i\ from X n and 
denote by Q(\) the largest eigenvalue of X^iX^) 1 . Since we have already shown the claim for the 
largest eigenvalue, it follows that £>(i)/S(2) — * 1 in probability. By the Cauchy Interlacing Theorem 
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10, Corollary Ed. 1.5] this implies /l(2)/S(2) ^ Q(i)/S(2) — > 1- Another application of the Minimax 
Principle yields 

A(2) = max min v J XX J v > min v J XX J v 

McR p veM vsspanje, e,,) 

dim(M)=2||v||=l | M | = i 

- min (ju] + n\)~ l ip\S w + /z|S (2) + 2(im(XX J )hi 2 ) ■ 



Since, by Proposition 13.21 and equation (fT8l ). 



?Wf2 rvvT\ 



a np max l<i<j<P^'t=l\ZitZjt\ p 



S (2) <£jpS<2) 
uniformly in fi\,fi2 e R> an application of the the continuous mapping theorem finally yields that 
A(2)/S(2) > 1 + op(l), where op(l) — > in probability as n — > oo. Thus the proof for k > 1 is 
complete. Now let k e (0, 1). Since X J X and XX 1 have the same non-trivial eigenvalues, we consider 
the transpose X T of X. This inverts the roles of p and n. Therefore, using Potter's bounds and 1/k > 1, 
the result follows from the same arguments as before. Note that we are in a special case of Theorem Q] 
(i) if k = 0. In case (b) we have that n ~ (l/clog(/?/C)) is a slowly varying function in p, thus an 
application of Theorem [j] (ii) (a) to X J gives the result. □ 

3.5. Proof of Theorem |^. As we shall see, the proof of Theorem will more or less follow the 
same lines of argument as given for Theorem [H We focus on the setting o f Th eorem (i) here and 
mention (ii) and (iii) later. The next result is a generalization of Proposition b.d allowing for random 
coefficients. 

Proposition 3.7. Define X = (Xj t ) with X; t satisfying (fTTT) and (fT2l) . Suppose ® holds for some ft > 0. 
If{9i) is a stationary ergodic sequence, then, conditionally on (6>,) as well as unconditionally, we have 

p oo 

(39) J ^ 2 - Xl § 'rf'i^ c^ff 

with (T,) as in ©. 

Proof. We first prove that, conditionally on (#,), 

p oo 

( 4 °) Z e < LU Zj ^K-j Z V*-' J - ' - ^ 



i=l i=l 1 



7 2/ "Hz^(e.)| ) 



by showing a.s. convergence of the Laplace functionals. By arguments from the proof of 11261. Proposi- 
tion 3. 17] it suffices to show ([TBI only for a countable subset of the space of all nonnegative continuous 
functions with compact support. Thus we fix one nonnegative continuous function / with compact 
support supp(/) c [c, oo], c > 0. Conditionally on the process (9,„), the points of the point process are 
independent, and thus 

< . _ < \\ 



/=i 
p 

(41) 



'=i i 
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where B Up = |(1 - e- f(x) )pP(a~j 2"=i lLj^i)^\ f -j e dx\6i). First assume 



(42) 



ijX^ B:= f ( i 



)v(dx) 



iff/2 



with v given by v((x, oo]) := x Q '^ 2 £' X ; <^(#i) , and 
(43) 



(=1 



Both claims will be justified later. By assumption (fTTT >. we have, using Proposition 13.51 almost surely 



Bin < PP 



a np 4 Z V-;' 
?=1 7 



> C 



-or/2 



2>? 



a/2 



and hence there exists a C > such that B^ p < C for all i,p e N a.s. The elementary inequality 

_-v_ _ ' 

gi-v < \ - x < e x Vx € [0, 1], equivalently ei-i < (1 - x)e* < 1 Vx € [0, 1], implies together with 

(1431) . for some ci > 0, that 



1 > Y\ 1 - —)e' ^Y\ e " 2 " B ''" ~ I! * e 

!=1 ' Pi i=\ i=\ 



p2 ^i=l i,p ^ 



1. 



As a consequence we have that the product in (|4TT > is asymptotically equivalent to 

i=l 

where the convergence follows from (|4"2"1) . This implies the almost sure convergence of the conditional 
Laplace functionals, th erefo re (|4"U1 ) holds conditionally on (#,-). Using (fTTT ) one shows similarly as in 
the proof of Proposition [53, conditionally on (#,), that (|4"U1 ) implies (T3"91 ). Taking the expectation yields 
that (|39l > also holds unconditionally. 

Proof of (1421 ant/ (|43T >. As a function in x, pP(Y!} = \ > a \ p x ) is decreasing and converges 
pointwise to the continuous function x~ a ^ 2 as n — > oo. Therefore this convergence is uniform on 
compact intervals of the form [xo,oo] with xo > 0. Now fix x > and let dj = X/ c2 (^')- Since 
di <d = Yjj c 2 j < 00 f° r all / e N, j > f > is bounded from below, and thus 



(44) 



sup 



pP 



n 



dj 



0. 



Since (dj) is an instantaneous function of the ergodic sequence (#,), it is also ergodic and thus 

p 



(45) 



\1 



i=i 



As a consequence of (1441) and (1451 ) we obtain 



1 p ( " x \ 

- £ pP £ z 2 > 4, b. _ x -/2 £ |^ 

" (=i Vf=i ' / 



lor/2 



0. 
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Then it is straightforward to show, as in the proof of Proposition 13.51 . using (fTTT) . that 



1 p \ " 



i=l 



f=l j 



^ x- a ' 2 E\d x f\ 



The vague convergence of above sequence of measures implies p~ l Yf i= \ Bi.p —* B almost surely. In 
exactly the same way one can show that p~ l J^f =1 B 2 ip converges, thus p~ 2 Y!j = \ B 2 p — > a.s., which 
establishes (14"21 and (|4"31 as claimed. □ 



Proof of Theorem^ Proof of (i). If we condition on (Of), the proofs of Propositions b.3l and 3A 
easily carry over to this more general setting when we make use of assumption (fTTT) . Taking the 
expectation then yields convergence in operator norm unconditionally. A combination of this together 
with Proposition s .l\ completes the proof. 
Proof of (ii). Note that 



is the only step in the proof of Proposition 13.71 where we use the 
ergodicity of the sequence (Of). But also if (Of) is just stationary, the ergodic theorem implies that the 
average in (1451) converges to the random variable Y = E {\di\ a ^ 2 \^j, where Q is the invariant cr-field 
generated by (Of). By construction, Y depends on a and c/G), but it is independent of (Tf), since (Of) is 
independent of (Zjf). 

Proof of (Hi). In this setting (Of) is a Markov chain which may not be stationary. But since we 
derive all results in the proof of Theorem (i) conditionally on (Of) and then take the expectation, 
stationarity is in fact not needed. The theory on Markov chains, see 12211 . in particular their Theorem 
17. 1.7 for Markov chains on uncountable state spaces, shows that (|45"T ) holds if the expectation is taken 
with respect to the stationary distribution of the Markov chain. □ 
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